A Large-Scale Fault-Tolerant Distributed Software-Build Process

نویسنده

  • Jim Buffenbarger
چکیده

A large software system can be compiled and linked more quickly if its build process is distributed across a network of multiple computers. However, large networks are more likely to contain a computer that causes a build to fail. If such a computer can be identified, it can be excluded from participation. Otherwise, if the failed command can be detected, it can be retried on a different computer. We describe our experiences designing, implementing, and maintaining a fault-tolerant distributed build process for an industrial software-development environment. We focus on techniques that augment the capabilities of available distributed-build tools. Our build process produces Hewlett-Packard’s laser printer firmware. Our environment includes hundreds of engineers, about one thousand computers, and about two million lines of code. As an example of the speedup provided by distribution, a forced sequential rebuild of all targets requiring about 155 minutes can be accomplished concurrently in about 35 minutes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Agreement Service for Implementing Fault Tolerant Distributed Software

Distributed systems includes a large number of processors which increases the risk of failures. Fault tolerance is of a key importance in such systems. Implementing fault tolerant distributed software (FTDS) is a di cult task [2]. Group communication services [8] such as group membership and reliable multicast has been proposed to solve some of the problems in implementing FTDS. In this paper w...

متن کامل

Somersault: Enabling Fault-Tolerant Distributed Software Systems

fault-tolerant, CORBA, process replication, process mirroring, high availability Somersault is a platform for developing distributed fault-tolerant software components and integrating these critical components with other components into distributed system solutions. Critical application processes are mirrored across a network, with each critical process being replicated in a primary and seconda...

متن کامل

A New Proactive Fault Tolerant Approach for Scheduling in Computational Grid

Grid Computing provides non-trivial services to users and aggregates the power of widely distributed resources. Computational grids solve large scale scientific problems using distributed heterogeneous resources. The Grid Scheduler must select proper resources for executing the tasks with less response time and without missing the deadline. There are various reasons such as network failure, ove...

متن کامل

Bio-inspired Fault Tolerant and Adaptive System Modeling and Simulation on the Grid

Grid computing, which is characterized as large-scale distributed resources sharing and cooperation, is becoming a mainstream technology in distributed computing. In this paper, we present the idea of applying grid-computing technology to model and simulate large-scale and high-performance bioinspired fault tolerant and adaptable control system. Gridbased workflow management service is employed...

متن کامل

Somersault Software Fault-Tolerance

software fault-tolerance, process replication failure masking, continuous availability, topology The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005